72. Edit Distance
Hard

Given two strings word1 and word2, return the minimum number of operations required to convert word1 to word2.

You have the following three operations permitted on a word:

  • Insert a character
  • Delete a character
  • Replace a character

 

Example 1:

Input: word1 = "horse", word2 = "ros"
Output: 3
Explanation: 
horse -> rorse (replace 'h' with 'r')
rorse -> rose (remove 'r')
rose -> ros (remove 'e')

Example 2:

Input: word1 = "intention", word2 = "execution"
Output: 5
Explanation: 
intention -> inention (remove 't')
inention -> enention (replace 'i' with 'e')
enention -> exention (replace 'n' with 'x')
exention -> exection (replace 'n' with 'c')
exection -> execution (insert 'u')

 

Constraints:

  • 0 <= word1.length, word2.length <= 500
  • word1 and word2 consist of lowercase English letters.
Accepted
364,952
Submissions
764,293
Seen this question in a real interview before?
Companies

Average Rating: 3.82 (98 votes)

Premium

Solution


Intuition

The edit distance algorithm is very popular among the data scientists. It's one of the basic algorithms used for evaluation of machine translation and speech recognition.

The naive approach would be to check for all possible edit sequences and choose the shortest one in-between. That would result in an exponential complexity and it's an overkill since we actually don't need to have all possible edit sequences but just the shortest one.


Approach 1: Dynamic Programming

The idea would be to reduce the problem to simple ones. For example, there are two words, horse and ros and we want to compute an edit distance D for them. One could notice that it seems to be more simple for short words and so it would be logical to relate an edit distance D[n][m] with the lengths n and m of input words.

Let's go further and introduce an edit distance D[i][j] which is an edit distance between the first i characters of word1 and the first j characters of word2.

edit_distance

It turns out that one could compute D[i][j], knowing D[i - 1][j], D[i][j - 1] and D[i - 1][j - 1].

There is just one more character to add into one or both strings and the formula is quite obvious.

If the last character is the same, i.e. word1[i] = word2[j] then

D[i][j]=1+min(D[i1][j],D[i][j1],D[i1][j1]1) D[i][j] = 1 + \min(D[i - 1][j], D[i][j - 1], D[i - 1][j - 1] - 1)

and if not, i.e. word1[i] != word2[j] we have to take into account the replacement of the last character during the conversion.

D[i][j]=1+min(D[i1][j],D[i][j1],D[i1][j1]) D[i][j] = 1 + \min(D[i - 1][j], D[i][j - 1], D[i - 1][j - 1])

So each step of the computation would be done based on the previous computation, as follows:

compute

The obvious base case is an edit distance between the empty string and non-empty string that means D[i][0] = i and D[0][j] = j.

Now we have everything to actually proceed to the computations

Current
1 / 19

Complexity Analysis

  • Time complexity : O(mn)\mathcal{O}(m n) as it follows quite straightforward for the inserted loops.
  • Space complexity : O(mn)\mathcal{O}(m n) since at each step we keep the results of all previous computations.

Report Article Issue

Comments: 48
robta00's avatar
Read More

Thanks to the guy who contributes to this answer, but when word[i] == word[j], it's simply dp[i][j] = dp[i-1][j-1]. The way the answer is written might look more consistent in two cases but it's more confusing.

102
Show 11 replies
Reply
Share
Report
tomasnovella's avatar
Read More

Reaaaaally ? You guys don't even mention in the solution that it's a good old Levenshtein distance ?
Please if you use some known algorithm, at least mention it!

252
Show 6 replies
Reply
Share
Report
sjw214's avatar
Read More

Plain English Description w/ JavaScript Solution:

Effectively, what the solution above is describing is the creation of a matrix/table that has inputs for all preceding inputs. One crucial step here is that the "base case" starts off with the empty String.

IE: When comparing "" vs "kitten", we have 6 changes, because we need to insert every character of kitten. So, 1 for "" vs "k", 2 for "" vs "ki", etc. We do the same thing by comparing the empty string to "sitting".

Let's start by iterating over our Strings, i for word1, and j for word2, and comparing each string to ' '. Our initial table should look like.

? | ' ' | k | i | t | t | e | n
' ' | 0 | 1 | 2 | 3 | 4 | 5 | 6
s | 1
i | 2
t | 3
t | 4
i | 5
n | 6
g | 7

Great! We have the foundation to build our solution. So comparing "k" to "s", we first want to check what the minimum solution of the previous strings were (' ' vs ' ', ' ' vs 'k', ' ' vs 's' ). We see that the minimum prior solution is in the diagonal of our table. So, we retrieve this value (table[i - 1][j - 1] which is currently 0) and use it. But since k does not equal s, we add 1, IE, our first step in comparing k vs s should look like:

? | ' ' | k | i | t | t | e | n
' ' | 0 | 1 | 2 | 3 | 4 | 5 | 6
s | 1 | 1 |
i | 2
t | 3
t | 4
i | 5
n | 6
g | 7

We can follow this same logic moving forward, comparing the previous solutions (left, diagonal, top with respect to table cells) and adding 1. I've filled in the table up to where we encounter "ki" vs "si". This is where we have to apply a slight modification.

? | ' ' | k | i | t | t | e | n
' ' | 0 | 1 | 2 | 3 | 4 | 5 | 6
s | 1 | 1 | 2 | 3 | 4 | 5 | 6
i | 2 | 1 |
t | 3
t | 4
i | 5
n | 6
g | 7

Because the ith and jth character match one another, we simply use the diagonal value as there are no new changes required. IE: "ki" vs "si" has the same solution as "k" vs "s" since the "i" is the same.

? | ' ' | k | i | t | t | e | n
' ' | 0 | 1 | 2 | 3 | 4 | 5 | 6
s | 1 | 1 | 2 | 3 | 4 | 5 | 6
i | 2 | 1 | 1
t | 3
t | 4
i | 5
n | 6
g | 7

We can continue with this logic filling in our table, and at the end our answer will be in the last diagonal cell! Here is the code:

/**
 * @param {string} word1
 * @param {string} word2
 * @return {number}
 */
var minDistance = function(word1, word2) {
     const dist = [[]];
    
     // we've added plus one to account for the empty string
     for (let i = 0; i <= word1.length; i++) {
         if (!dist[i]) { // because initializing empty Arrays with current sizes in JavaScript is weird
             dist[i] = [];
         }
         
         for (let j = 0; j <= word2.length; j++) {
             if (i === 0) {
                 // i is the empty string, so 1 insert per char of j
                 dist[i][j] = j;
             } else if (j === 0) {
                 // j is the empty string, so 1 insert per char of i
                 dist[i][j] = i;
             } else if (word1[i - 1] === word2[j - 1]) {
                 // the last character matches, so use the diagonal
                 // this is because the no changes required from previous
                 dist[i][j] = dist[i - 1][j - 1];
             } else {
                 dist[i][j] = Math.min(1 + Math.min(
                    dist[i - 1][j],
                    dist[i][j - 1],
                    dist[i - 1][j - 1]
                 ));
             }
         }
     }
    
    return dist[word1.length][word2.length];
};

Hope this helps someone!

51
Show 5 replies
Reply
Share
Report
ect582's avatar
Read More

Please do not provide spaghetti code like this:
if (n * m == 0)
return n + m;
just write two if statements, which are much better.

99
Show 2 replies
Reply
Share
Report
fawadsuhail's avatar
aramik's avatar
Read More

For all the people who want to have a better understanding of this problem I will refer to Algorithm Design Manual book by Steven Skiena page 282 or section 8.2.2.

11
Reply
Share
Report
priyanshu25's avatar
Read More

For all those who are stuck can refer this video https://youtu.be/WgmZ-5qAHJ8 . Found this really useful and learnt how to approach such DP Problems.

4
Reply
Share
Report
bupt_wc's avatar
Read More

Hi, @andvary , I'm very curious about how the short video in this solution is made.
Because I have been trying to write solutions, I always wanted to make such a video to describe my ideas.
Can you tell me what the name of this video is, and it would be better if you could recommend a software for making this video.
Thanks a lot.

4
Show 1 reply
Reply
Share
Report
vulkum's avatar
Read More

You get -10000 points for the poor naming of variables. Here, a much nicer version of the algorithm:
https://www.***.org/edit-distance-dp-5/

2
Reply
Share
Report
ruinart's avatar
Read More

O(n)-space Python:

n = len(word1)
dp = [i for i in range(n + 1)]
for i in range(1, len(word2) + 1):
    p, dp[0] = dp[0], i
    for j in range(1, n + 1):
        t = dp[j]
        dp[j] = p if word1[j - 1] == word2[i - 1] else min(p, dp[j], dp[j - 1]) + 1
        p = t
return dp[n]

3
Show 4 replies
Reply
Share
Report
Time Submitted
Status
Runtime
Memory
Language
03/18/2021 08:05Accepted8 ms7.2 MBcpp
06/14/2020 15:42Accepted24 ms8.9 MBcpp
/23
Your previous code was restored from your local storage.Reset to default
Contribute
|||
Saved
Trie
This list is empty.